Short-term forecasts of COVID-19 deaths in multiple countries
Introduction
As of 13-04-2020, more than 1,170,000 cases of COVID-19 have been reported across the world, with more than 111,000 deaths (1). This weekly report presents forecasts of the reported number of deaths in the week ahead and analysis of case reporting trends (case ascertainment) for 36 countries with active transmission.
The accuracy of these forecasts vary with the quality of surveillance and reporting in each country. We use the reported number of deaths due to COVID-19 to make these short-term forecasts as these are likely more reliable and stable over time than reported cases. In countries with poor reporting of deaths, these forecasts will likely represent an under-estimate while the forecasts for countries with few deaths might be unreliable. Our estimates of transmissibility reflect the epidemiological situation at the time of the infection of COVID-19 fatalities. Therefore, the impact of controls on estimated transmissibility will be quantifiable with a delay between transmission and death.
Summary
Forecasts and Transmissibility Estimates
Based on our best estimates of transmissibility, the COVID-19 epidemic is:
-
likely declining1 in 1 countries
(South Korea); -
likely stabilising2 in 19 countries
(Algeria, Austria, China, Czechia, Denmark, Dominican Republic, France, Germany, Iran, Italy, Morocco, Netherlands, Philippines, Poland, Portugal, Spain, Sweden, Switzerland, United Kingdom); -
likely growing slowly3 in 3 countries
(Ireland, Turkey, United States of America); -
likely growing rapidly4 in 13 countries
(Belgium, Brazil, Canada, Colombia, Ecuador, Egypt, India, Indonesia, Israel, Mexico, Peru, Romania, Russia).
Based on the central trends in the forecasts, the total number of reported deaths in the coming week is expected to be:
-
relatively small in 6 countries (<100 deaths)
(China, Czechia, Dominican Republic, Israel, Morocco, South Korea); -
relatively large in 20 countries (100 to 1000 deaths)
(Algeria, Austria, Canada, Colombia, Denmark, Ecuador, Egypt, India, Indonesia, Iran, Ireland, Mexico, Peru, Philippines, Poland, Portugal, Romania, Russia, Sweden, Switzerland), -
large in 7 countries (1,000 to 5,000 deaths)
(Belgium, Brazil, Germany, Italy, Netherlands, Spain, Turkey), -
very large in 3 countries (> 5,000 deaths)
(France, United Kingdom, United States of America).
- if the 97.5th percentile of \(R_t\) is below 1.
- if the 95% Credible Interval (CrI) of \(R_t\) contains 1.
- if the 2.5th percentile of \(R_t\) is greater than 1 and the 97.5th percentile is less than 2.
- if the 97.5th percentile of \(R_t\) is greater than 2.
Objectives and Caveats
The main objective in this report is to produce forecasts of the number of deaths in the week ahead for each country with active transmission.
We define a country as having active transmission if at least 100 deaths have been reported in a country so far, and at least ten deaths were observed in the country in each of the past two weeks. For the week beginning 12-04-2020, the number of countries/regions included based on these thresholds is 36.
We forecast the number of potential deaths as the reporting of deaths is likely to be more reliable and stable over time than the reporting of cases.
As we are forecasting deaths, the latest estimates of transmissibility reflect the epidemiological situation at the time of the infection of COVID-19 fatalities. Therefore, the impact of controls on estimated transmissibility will be quantifiable with a delay between transmission and death.
A secondary objective of this report is to analyse case ascertainment per country. As well as forecasting ahead, we use the number of reported deaths and of cases reported with a delay (delay from reporting to deaths, see Case Ascertainment method) to analyse the reporting trends per country. If the reporting of cases and deaths were perfect, and the delay between reporting and death is known, the ratio of deaths to delayed cases would equal the Case Fatality Ratio (CFR).
In this analysis, key assumptions are:
- The mean underlying CFR is 1.38% (95% Credible Interval (1.23 - 1.53)) (2),
- The delay from a case being reported to death follows a gamma distribution with mean 10 days, and standard deviation 2 days.
- All deaths due to COVID-19 have been reported in each country.
Projections and Effective Reproduction Number Estimates
Projections
Current and past forecasts
Note: The projections and estimates of \(R_t\) assume a constant reporting of death. That is, even if deaths are not under-reported, we assume a constant reporting rate over time. This assumption does not always hold. For instance, we note that in France, there was a large increase in the reported number of deaths towards the end of the week starting 30-03-2020. This is largely due to back-reporting of deaths outside hospital settings, and therefore, this is likely to have inflated the estimates of \(R_t\), leading to over-estimated forecasts in the following week [ECDC, personal communication].
Europe
Figure 1. Reported daily deaths and current forecasts based on the ensemble model. For each European country with active transmission (see Methods), we plot the observed incidence of deaths (black dots). Forecasts for the week ahead are shown in red (median and 95% CrI). Vertical dashed lines show the start and end of each week (Monday to Sunday). Projections and estimates of \(R_t\) for countries marked with a * are based on an unweighted ensemble of three models (Models 1, 2 and 3). For countries marked with ** ((Austria, Belgium, Denmark, France, Germany, Italy, Netherlands, Portugal, Spain, Sweden, and Switzerland), projections and estimates of \(R_t\) are based on an ensemble that includes 4 models (Models 1, 2, 3 and 4). For the United Kingdom, projections and \(R_t\) estimates from Model 4 are shown here. Results from individual models are shown in the section Methods.
Asia
Figure 2. Reported daily deaths and current forecasts based on the ensemble model. For each country in Asia with active transmission (see Methods), we plot the observed incidence of deaths (black dots). Forecasts for the week ahead are shown in red (median and 95% CrI). Vertical dashed lines show the start and end of each week (Monday to Sunday). Projections and estimates of \(R_t\) for countries marked with a * are based on an unweighted ensemble of three models (Models 1, 2 and 3, see Methods).
Africa
Figure 3. Reported daily deaths and current forecasts based on the ensemble model. For each country in Africa with active transmission (see Methods), we plot the observed incidence of deaths (black dots). Forecasts for the week ahead are shown in red (median and 95% CrI). Vertical dashed lines show the start and end of each week (Monday to Sunday). Projections and estimates of \(R_t\) for countries marked with a * are based on an unweighted ensemble of three models (Models 1, 2 and 3, see Methods).
North America
Figure 4. Reported daily deaths and current forecasts based on the ensemble model. For each country in North America with active transmission (see Methods), we plot the observed incidence of deaths (black dots). Forecasts for the week ahead are shown in red (median and 95% CrI). Vertical dashed lines show the start and end of each week (Monday to Sunday). Projections and estimates of \(R_t\) for countries marked with a * are based on an unweighted ensemble of three models (Models 1, 2 and 3, see Methods).
South America
Figure 5. Reported daily deaths and current forecasts based on the ensemble model. For each country in South America with active transmission (see Methods), we plot the observed incidence of deaths (black dots). Forecasts for the week ahead are shown in red (median and 95% CrI). Vertical dashed lines show the start and end of each week (Monday to Sunday). Projections and estimates of \(R_t\) for countries marked with a * are based on an unweighted ensemble of three models (Models 1, 2 and 3, see Methods).
Estimates of the current Effective Reproduction Number
Europe
Figure 6.: Latest estimates of effective reproduction numbers by country (median, inter-quartile range and 95% CrI) for each country in Europe with sustained transmission. Estimates of \(R_t\) for countries marked with a * are based on an unweighted ensemble of two models (Models 1 and 2; Model 3 does not estimate \(R_t\)). For countries marked with ** ((Austria, Belgium, Denmark, France, Germany, Italy, Netherlands, Portugal, Spain, Sweden, and Switzerland), projections and estimates of \(R_t\) are based on an ensemble that includes 3 models (Models 1, 2, and 4). For the United Kingdom, \(R_t\) estimates from Model 4 are shown here. Results from individual models are shown in the section Methods.
Asia
Figure 7.: Latest estimates of effective reproduction numbers by country (median, inter-quartile range and 95% CrI) for each country in Asia with sustained transmission. Estimates of \(R_t\) for countries marked with a * are based on an unweighted ensemble of two models (Models 1 and 2; Model 3 does not estimate \(R_t\)). Results from individual models are shown in the section Methods.
Africa
Figure 8.: Latest estimates of effective reproduction numbers by country (median, inter-quartile range and 95% CrI) for each country in Africa with sustained transmission. Estimates of \(R_t\) for countries marked with a * are based on an unweighted ensemble of two models (Models 1 and 2; Model 3 does not estimate \(R_t\)). Results from individual models are shown in the section Methods.
North America
Figure 9.: Latest estimates of effective reproduction numbers by country (median, inter-quartile range and 95% CrI) for each country in North America with sustained transmission. Estimates of \(R_t\) for countries marked with a * are based on an unweighted ensemble of two models (Models 1 and 2; Model 3 does not estimate \(R_t\)). Results from individual models are shown in the section Methods.
South America
Figure 10.: Latest estimates of effective reproduction numbers by country (median, inter-quartile range and 95% CrI) for each country in South America with sustained transmission. Estimates of \(R_t\) for countries marked with a * are based on an unweighted ensemble of two models (Models 1 and 2; Model 3 does not estimate \(R_t\)). Results from individual models are shown in the section Methods.
Summary of results
Table 1. Forecasted weekly death counts for week starting 2020-04-12, the observed number of deaths in the week before, and the estimated levels of transmissibility from the ensemble model for each country with active transmission (see Methods). For the forecasted weekly deaths counts and estimates of \(R_t\), the table shows the median estimate and the 95% CrI. The number of deaths has been rounded to 3 significant figures.
Analysis of Trends in Reporting
Temporal trend in the ratio of deaths to reported cases
Starting in March, we compute the average and 95% CI for the ratio of deaths to reported cases (with a mean 10-day delay) using a moving window of 7 days. The ratio accounts for the delay between death and case being reported. Any temporal trend in the ratio suggests a change in the reporting. For instance, an increase in the ratio gives an indication that cases reporting is decreasing. If all cases (including asymptomatic cases) and death were reported, then the ratio defined would be equivalent to the IFR.
Europe
Asia
Africa
North America
South America
Figure 11.: Temporal trends in the ratio of reported deaths to reported cases 10 days prior (medians and 95% CIs, solid lines and bands respectively). Also plotted are the reported deaths (red dots) and reported cases (black dots). The reported number of deaths and cases have been re-scaled so that the maximum recorded numbers of deaths or cases (with a mean 10-day delay) reaches 1.
Note that if deaths exceed the number of reported cases 10 days before, we set the ratio at 1 (95% CI 1-1)
Case Ascertainment
Case ascertainment was estimated based on the deaths in the previous 2 weeks and reported cases in the 10 days prior to that period. Estimates of case ascertainment were highly variable and, due to the underlying assumption of perfect reporting, are likely to be an underestimate. In particular, community deaths due to COVID-19 are likely under-reported (3).
Table 2. Results summary for reporting and case ascertainment. We present (i) the estimated ratio of deaths to reported cases 10 days prior for each country (mean and 95% CI) during the last 14 days; (ii) the estimated case ascertainment per country (mean and 95% CI) during the last 14 days. Estimated case ascertainment, factors to real size and predicted true number of cases assume perfect reporting of deaths.
Methods
We define a country to have active transmission if
- at least 100 deaths have been observed in the country so far; and
- at least ten deaths were observed in the country in the last two consecutive weeks.
We intend to produce forecasts every week, for the week ahead. Ensemble forecasts are produced from the outputs of three different models. We assume a gamma distributed serial interval with mean 6.48 days and standard deviation of 3.83 days following (4).
Serial Interval
We assume a gamma distributed serial interval with mean 6.48 days and standard deviation of 3.83 days following (4).
Model 1
The approach estimates the current reproduction number (the average number of secondary cases generated by a typical infected individual, \(R_t\)) and to use that to forecast future incidence of death. The current reproduction number is estimated assuming constant transmissibility during a chosen time-window (here, one week).
Estimating current transmissibility
Here we relied on a well-established and simple method (5) that assumed the daily incidence, It (here representing deaths), could be approximated with a Poisson process following the renewal equation (6):
\[I_t \sim Pois\left( R_t \sum_{s=0}^tI_{t-s}w_s\right)\]
where \(R_t\) is the instantaneous reproduction number and \(w\) is the serial interval distribution. From this a likelihood of the data given a set of model parameters can be calculated, as well the posterior distribution of \(R_t\) given previous observations of incidence and knowledge of the serial interval (7).
We used this approach to estimate \(R_t\) over three alternative time-windows defined by assuming a constant \(R_t\) for either the 2, 3 or 4 weeks prior to the most recent data-point. We made no assumptions regarding the epidemiological situation and transmissibility prior to each time-window. Therefore, no data prior to the time-window were used to estimate \(R_t\), and instead we jointly estimated \(R_t\) as well as back-calculated the incidence before the time-window. Specifically, we jointly estimated the \(R_t\) and the incidence level 100 days before the time-widow. Past incidence was then calculated using the known relationship between the serial interval, growth rate and reproduction number. The joint posterior distribution of \(R_t\) and the early epidemic curve (from which forecasts will be generated) were inferred using Markov Chain Monte Carlo (MCMC) sampling.
The model has the advantage of being robust to changes in reporting before the time-window used for inference.
Forward projections
We used the renewal equation (6) to project the incidence forward, given a back-calculated early incidence curve, an estimated reproduction number, and the observed incidence over the calibration period. We sampled sets of back-calculated early incidence curves and reproduction numbers from the posterior distribution obtained in the estimation process. For each of these sets, we simulated stochastic realisations of the renewal equation from the end of the calibration period leading to projected incidence trajectories.
Projections were made on a 7-day horizon. The transmissibility is assumed to remain constant over this time period. If transmissibility were to decrease as a result of control interventions and/or changes in behaviour over this time period, we would predict fewer deaths; similarly, if transmissibility were to increase over this time period, we would predict more deaths We limited our projection to 7 days only as assuming constant transmissibility over longer time horizons seemed unrealistic in light of the different interventions implemented by different countries and potential voluntary behaviour changes.
Model 2
Estimating current transmissibility
The standard approach to inferring the effective reproduction number at \(t\), \(R_t\), from an incidence curve (with cases at t denoted It) is provided by (7). This method assumes that \(R_t\) is constant over a window back in time of size k units (e.g. days or weeks) and uses the part of the incidence curve contained in this window to estimate \(R_t\). However, estimates of \(R_t\) can depend strongly on the width of the time-window used for estimation. Thus mis-specified time-windows can bias our inference. In (8) we use information theory to extend the approach of Cori et al. to optimise the choice of the time-window and refine estimates of \(R_t\). Specifically:
We integrate over the entire posterior distribution of \(R_t\), to obtain the posterior predictive distribution of incidence at time t+1 as P(It+1 | I1t) with I1t as the incidence curve up to t. For a gamma posterior distribution over \(R_t\) this is analytic and negative binomial ((8) for exact formulae).
We compute this distribution sequentially and causally across the existing incidence curve and then evaluate every observed case-count according to this posterior predictive distribution. For example at t = 5, we pick the true incidence value I5* and evaluate the probability of seeing this value under the predictive distribution i.e. P(I5 = I5* | I14).
This allows us to construct the accumulated predictive error (APE) under some window length k and under a given generation time distribution as:
\[\text{AP}E_{k} = \sum_{t = 0}^{T - 1}{- \log{P\left( I_{t + 1} = I_{t + 1}^{*}\ \right|\ I_{t - k + 1}^{t})\ \ }}\]
The optimal window length k* is then \(k^{*} = \arg{\min_{k}{\text{AP}E_{k}}}\). Here T is the last time point in the existing incidence curve.
Forward Projections
Forward projections are made assuming that the transmissibility remains unchanged over the projection horizon and same as the transmissibility in the last time-window. The projections are made using the standard branching process model using a Poisson offspring distribution.
Model 3
Objectives
- Estimate trends in case ascertainment and the ratio of deaths to reported cases.
- Use these to estimate the true size of the epidemic.
- Use these to forecast the number of deaths in the coming week.
Assumptions
We assume
- that deaths due to COVID-19 are perfectly reported;
- a known distribution for delay from report to death (gamma distribution with mean 10 days and standard deviation 2 days); and,
- a known distribution for CFR (2).
Let \(D_{i, t}\) be the number of deaths in location \(i\) at time \(t\). Let \(I_{i, t}^r\) be the reported number of cases in location \(i\) at time \(t\) and \(I_{i, t}^{true}\) be the true number of cases. We assume that the reporting to death delay \(\delta\) is distributed according to a Gamma distribution with mean \(\mu\) and standard deviation \(\sigma\). That is,
\[\delta \sim \Gamma(\mu, \sigma).\]
Let \(r_{i, t}\) be the ratio of deaths to reported cases in location \(i\) at time \(t\). We assume that deaths are distributed according to a Binomial distribution thus: \[ D_{i, t} \sim Binom\left( \int\limits_0^{\infty}{\Gamma(x \mid \mu, \sigma)I_{i, t - x}^{r}dx} , r_{i, \mu}\right). \]
This allows us to obtain a posterior distribution for \(r_{i, t}\).
Case ascertainment is defined as:
\[ \rho_{i, t} = \frac{CFR}{r_{i, t}}.\]
Thus a posterior distribution for \(\rho_{i, t}\) can be obtained using the posterior distribution for \(r_{i, t}\) and the posterior distribution for CFR.
Combining the CFR and the case ascertainment, we can estimate the true number of cases in the epidemic at any point.
For the period over which we have information on deaths i.e., up to time \(t - \mu\), we use the posterior distribution of CFR to obtain \(I_{i, t}^{true}\). The true number of cases in a location \(i\) at time \(t\) is the sum cases that did not die and the number of deaths. \[ I_{i, t}^{true} \sim D_{i, t - \mu} + NBin(D_{i, t - \mu}, CFR). \]
In this formulation, the negative binomial is parameterised as \(NBin(n, p)\) where \(n\) is the number of failure (i.e. death), and \(p\) is the probability of observing a failure.
For the period over which we do not have information on deaths i.e., after time \(t - \mu\), we use the posterior distribution of case ascertainment to obtain
\[ I_{i, t}^{true} \sim I_{i, t}^{r} + NBin(I_{i, t}^{r}, \rho_{i, t}). \]
To obtain forecast of deaths, we rely on reported cases to obtain \(\int\limits_0^{\infty}{\Gamma(x \mid \mu,\sigma)I_{i, t - x}^{r}dx}\). As cases reported in the coming week may die within the same week ( i.e. for \(x \in \{0,7\}\), \(\Gamma(x \mid \mu,\sigma) > 0\)), we estimate new reporting cases in the coming week by sampling from a Gamma distribution with mean and standard deviation estimated from the number of observed cases in the last week.
While this assumes no growth or decline in the coming week, this baseline assumtpion is justfiable as a null-hypthesis scenario as it does not influence our results dramatically given the contribution to deaths due to those being very small (i.e. less than 2%).
We therefore obtain the forecasted number of deaths as: \[ D_{i, t} \sim Binom\left( \int\limits_0^{\infty}{\Gamma(x \mid \mu, \sigma)I_{i, t - x}^{r}dx} , r_{i, \mu}\right). \]
where \(r_{i, \mu}\) is the estimated ratio of deaths to reported cases for the last week of data, and \(\int\limits_0^{\infty}{\Gamma(x \mid \mu,\sigma)I_{i, t - x}^{r}dx}\) relies on observed reported cases up to the last day with available and estimated reported cases as described above.
Model 4
Model 4 is a Bayesian model that calculates backwards from the deaths observed over time to estimate transmission that occurred several weeks prior. This model estimates the number of infections, deaths and the changes in transmissibility due to the non-pharmaceutical interventions for 12 European countries (Austria, Belgium, Denmark, France, Germany, Italy, Netherlands, Portugal, Spain, Sweden, Switzerland and United Kingdom). Details for this model and models results can be found here.
Ensemble Model
For the 12 European countries where we have results from Model 4 (Austria, Belgium, Denmark, France, Germany, Italy, Netherlands, Portugal, Spain, Sweden, Switzerland and United Kingdom), the ensemble model is an unweighted ensemble of Models 1, 2, 3 and 4. For all other countries, the ensemble model is built from Models 1, 2 and 3.
Individual Model Outputs
Projections
Europe
Figure 12. Projections (7-day ahead) for the week starting 2020-04-12 from individual models for each country in Europe with active transmission (see Methods). For countries marked with a *, projections from four models are available (Models 1, 2, 3 and 4). For countries marked with **, projections from Models 1, 2 and 3 are available. For each model, the solid line shows the median and the shaded region shows the 95% CrI of the projections.
Asia
Figure 13. Projections (7-day ahead) for the week starting 2020-04-12 from individual models (Models 1, 2 and 3) for each country in Asia with active transmission (see Methods). For each model, the solid line shows the median and the shaded region shows the 95% CrI of the projections.
Africa
Figure 14. Projections (7-day ahead) for the week starting 2020-04-12 from individual models (Models 1, 2 and 3) for each country in Africa with active transmission (see Methods). For each model, the solid line shows the median and the shaded region shows the 95% CrI of the projections.
North America
Figure 15. Projections (7-day ahead) for the week starting 2020-04-12 from individual models (Models 1, 2 and 3) for each country in North America with active transmission (see Methods). Model 4 did not include these countries. For each model, the solid line shows the median and the shaded region shows the 95% CrI of the projections.
South America
Figure 16. Projections (7-day ahead) for the week starting 2020-04-12 from individual models (Models 1, 2 and 3) for each country in South America with active transmission (see Methods). For each model, the solid line shows the median and the shaded region shows the 95% CrI of the projections.
Effective Reproduction Number
Europe
Figure 17. Estimates of \(R_t\) from individual models for each country in Europe with active transmission (see Methods) for the week starting 2020-04-12. For countries marked with a **, \(R_t\) estimates from three models are available (Models 1, 2, and 4). For countries marked with *, estimates from Models 1 and 2 are available.
Asia
Figure 18. Estimates of \(R_t\) from individual models for each country in Asia with active transmission (see Methods) for the week starting 2020-04-12. For countries marked with a *, \(R_t\) estimates from two models are available (Models 1 and 2).
Africa
Figure 19. Estimates of \(R_t\) from individual models for each country in Africa with active transmission (see Methods) for the week starting 2020-04-12. For countries marked with a *, \(R_t\) estimates from two models are available (Models 1 and 2).
North America
Figure 20. Estimates of \(R_t\) from individual models for each country in North America with active transmission (see Methods) for the week starting 2020-04-12. For countries marked with a *, \(R_t\) estimates from two models are available (Models 1 and 2).
South America
Figure 21. Estimates of \(R_t\) from individual models for each country in Africa with active transmission (see Methods) for the week starting 2020-04-12. For countries marked with a *, \(R_t\) estimates from two models are available (Models 1 and 2).
Code
All code used for this analysis can be found at: https://github.com/mrc-ide/covid19-forecasts-orderly